Structural Compression Of Document Images With PDF/A

نویسندگان

  • Sergey Usilin
  • Dmitry P. Nikolaev
  • Vassili V. Postnikov
چکیده

This paper describes a new compression algorithm of document images based on separating the text layer from the graphics one on the initial image and compression of each layer by the most suitable common algorithm. Then compressed layers are placed into PDF/A, a standardizated file format for long-term archiving of electronic documents. Using the individual separation algorithm for each type of document makes it possible to save the image to the best advantage. Moreover, the text layer can be processed by an OCR system and the recognized text can also be placed into the same PDF/A file for making it easy to perform cut and paste and text search operations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing PDF output size of TEX documents

There are several tools for generating PDF output from a TEX document. By choosing the appropriate tools and configuring them properly, it is possible to reduce the PDF output size by a factor of 3 or even more, thus reducing document download times, hosting and archiving costs. We enumerate the most common tools, and show how to configure them to reduce the size of text, fonts, images and cros...

متن کامل

2 Transform and Encoding Algorithm

The present contribution proposes a new remarkably eecient image compression algorithm for graylevel images based on dyadic wavelet transformation. In order to achieve perfect reconstruction, orthogonal decomposition is applied. Scalar quantization of wavelet coeecients is combined with run-length coding. Code word assignment is performed by semi-adaptive Huuman coding (determined by validity t...

متن کامل

An introduction to source coding

Format: ePub / PDF / Kindle This book provides a global understanding of source coding with an overview of practical coding schemes for speech, music and images. The first section covers background, theoretical material. It shows how source coding...

متن کامل

Efficient document rendering with enhanced run length encoding

Document imaging and transmission systems (typically MFPs) require both effective and efficient image rendering methods that support standard data formats for a variety of document types, and allow for real time implementation. Since most conventional raster formats (e. g. TIFF, PDF, JPEG) are designed for use with either black and white text, or continuous-tone images, more specialized renderi...

متن کامل

پژوهشی کیفی در تحلیل الگوی بهره‌گیری خبرگان حوزه‌ی سلامت از تصاویر پزشکی

Introduction: In health sector, image functions as a form of document that can convey a considerable amount of information. Employing this type of information can increase the effectiveness of the performance of medical experts. This study aimed to survey how health experts use medical images in their practice. Methods: This applied qualitative study was carried out in 1392 (2013). The study p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010